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Abstract 

We introduce and analyze a minimal network model of semantic memory in the human brain. 



^ I The model is a global associative memory structured as a collection of local modules, each 

■ coding a feature, which can take S possible values, with a global sparseness a (the average fraction 



of features describing a concept). We show that, under optimal conditions, the number cm of 
modules connected on average to a module can range widely between very sparse connectivity 
- (high dilution, cm/N 0) and full connectivity (cm A), maintaining a global network storage 

Q ■ capacity (the maximum number pc of stored and retrievable concepts) that scales like pc ~ cmS'^ /a, 

o : 

with logarithmic corrections consistent with the constraint that each synapse may store up to a 
fraction of a bit. 
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I. INTRODUCTION 



Hebbian associative plasticity appears to be the major mechanism responsible for sculpt- 
ing connections between pyramidal neurons in the cortex, for both short- and long-range 
systems of synapses. This and other lines of evidence jl| suggest that autoassociative mem- 
ory retrieval is a general mechanism in the cortex, occurring not only at the level of local 
networks, but also in higher order processes involving many cortical areas. These areas 
are often regarded both from the anatomical and from the functional point of view as dis- 
tinct but interacting modules, indicating that in order to model higher order processes we 
must first understand better how multimodular autoassociative memories may operate. In a 
class of models conceived along these lines, neurons in local modules, interconnected through 
short-range synapses, are capable of retrieving local activity patterns, which combined across 
-he cortex and interacting through long-range synapses, compose global states of activity 



. Since long-range synapses are also modified by associative plasticity, these states can be 
driven by attractor dynamics, and such networks are capable of retrieving previously learned 
global patterns. 

This could serve as a simple model of semantic memory retrieval. The semantic memory 
system, as opposed to episodic memory, stores composite concepts, e.g. objects, and their 
relationships. Although information about distinct features pertaining to a given object 
(e.g. its shape, smell, texture, function) may be processed in different areas of the cortex, 
a cue including only some of the features, e.g. the shape and color, may suffice to elicit 
retrieval of the entire memory representation of the object. Imaging studies show that, 
though distributed across the cortex, this activity is sparse and selective, and might involve 
regions associated to the concept being retrieved, even if not directly activated by the cue ^. 
This process could well fit a description in terms of autoassociative multimodular memory 
retrieval. In this perspective, while a local module codes for diverse values of a given feature, 
a combination of features gives rise to a concept, which behaves as an attractor of the global 
network and is thus susceptible of retrieval. The two-level description that characterizes this 
view is the principal difference with other attempts to describe semantic memory in terms 
of featural representations m. 

riQ 

In order to reduce the complexity of a full multimodular model p, la| one can consider 
a minimal model of semantic memory, which can be thought of as a global autoassociative 
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memory in which the units, instead of representing, as usual, individual neurons, represent 
local cortical networks retrieving one of various (S) possible states of activity. The combined 
activity of these units generates a global state, which follows a retrieval dynamics. The first 
question arising from this proposal is how the global storage capacity of such a network is 
related to the different local and global parameters. 

In the following section of this paper we present the model in mathematical terms. In the 
third section we compare, through a simple signal-to-noise analysis, different model variants 
proposed in the literature and extract the minimum requirements for a network of this kind 
to perform efficiently in terms of storage capacity. In the fourth section we analyze with 
more sophisticated techniques the simplest model endowed with a large capacity (the sparse 
Potts model) and, in particular, interesting cases such as the very sparse and the high-S* 
limits. Following this we study modifications to the model that make it more realistic in 
terms of connectivity. Finally, we relate the results from the previous sections to a simple 
information capacity analysis. 

II. 5-STATE FULLY CONNECTED NETWORKS 

Autoassociative memories are networks of units connected to one another by weighted 
synapses. These synapses are trained in such a way that the network presents, in the ideal 
case, a number p of preassigned attractor states, also called stored patterns, or memories, 
represented by the vectors with fi = l...p. If the state of the network is forced into the 
vicinity of an attractor (e.g., by presenting a cue correlated with one of the stored patterns) 
the natural dynamics of the network converges toward the attractor, in state space, and the 
memory item is said to be retrieved. A substantial amount of the literature on attractor 
networks is devoted to study the relationship between the number and type of stored patterns 
and the quality of retrieval. 

The state of a network at a given moment is given by the state of each of its units, cjj 
for i = 1...N. The first quantitative analyses of autoassociative memories were of binary 
models in which units could reach two possible states, +1 (active unit) and —1 (inactive 
unit), resembling Ising | spins. In our case, in which units do not represent single neurons 
but rather local networks, we want active units to be able to reach one of S possible states, 
while inactive units remain in a 'zero' state. We thus choose the notation ai = k for an 
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active unit in state k and ai = for an inactive unit. This particular choice has no effect 
on the results, since all quantities can be transformed to some other notation. On the other 
hand, the stored patterns ^'^ can be simply thought of as special states of the network. For 
this reason, it is natural to choose the same kind of representation for the activity of a unit 
i in pattern /i, . 

Although in the first binary models of autoassociative memories patterns where con- 
structed with a distribution of equally probable active and inactive units, the search of an 
accurate description of activity in the brain made it necessary to introduce sparse represen- 
tations. This property of autoassociative memories is described by the sparseness a, defined 
as the average activity (the average fraction of active units) in the stored patterns. In our 
case, because we are assuming all S different activity states to be equally probable, we 
consider patterns defined by the following probability distribution 

P(e = 0) = 1 - a 

p(ef = k) = d^^ (1) 

for any active state k. In this way the probability to find an active unit in a pattern is the 
sparseness a. For sparse codes, this quantity is closer to than to 1. 

Following the assumption of Hebbian learning and, as is usual for a simplified analysis, 
symmetry in the weights (Jjj = Jji), a general form for the weights is 




where E is some normalization constant and Vmn is an operator computing interactions 
between two states. 

As one can notice, the long-range synapse weights in Eq. 01 have different values for 
different pre- and post- synaptic states k and /. In this way we do not intend to model the 
actual distribution of synapses going from one cortical area to another (since they connect 
neurons and not abstract states), but rather the general mechanism of communication be- 

n 

tween these areas. In a recent study the authors have raised the issue of finding the 
most suitable description of global cortical networks in terms of single long-range synapses 
connecting distant local areas. Applying statistical tools (Dynamic Causal Modeling), they 
propose that MRI data can be described as produced by networks with category specific 
forward connections, roughly the kind of connections modelled by Eq. O 
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The state of generic unit i is determined by its local fields /if, which sum the influences 
by other units in the network and are defined as 

where we introduce the operators Umn, analogous to Vmn, and a second (threshold) term, 
which has the function of regulating the activity level across the network 0, [l^. The unit 
i updates its state cTj, with an asynchronous dynamics, in order to maximize the local field 
h^' . In the general case, the probability to choose the state k is defined as 

1-[(J, - K) - 

E«=oexp(/?/i-) 

where /? is a parameter analogous to an inverse temperature. 

Finally, we can include all of these elements, as is usual for the study of attractor networks, 
into a Hamiltonian framework. The Hamiltonian representation of binary networks can be 
extended to 5'-state models as 

^ N S N S 

^ = -2^Yl J'S^kU^.i + ^ E E "--'^ (4) 

k,l i k^O 

Note that for the case 5* = 1, Eq. HI generalizes the Hamiltonians used in binary networks, 
given appropriate definitions of the weights J^^j and of the operators Umn- 

We now specify a form for the Umn and Vmn operators. In the simplest and most symmetric 
case these operators have two alternative values, depending on whether m and n are equal 
or different states 

Vmn = {K-v^mn + - Sno) (5) 

where we have introduced four parameters. Particular choices for these parameters define 
the different models in which we are interested, including several proposed in the literature. 
In the V operators, which define the value of the weights, we have included a factor which 
ensures J^j = if either k or / are the zero state, to implement the idea that Hebbian 
learning occurs only with active states. As we will see below, this appears to be a crucial 
element in the model. 
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III. SIGNAL-TO-NOISE ANALYSES 



We now show that, within the group of models defined in the previous section, there is 
a family (which we call 'well behaved') that exploit multiple states and sparseness in an 
optimal way in terms of storage capacity or, as usual, of a = p/N . We begin by applying 
an adjusted version of the arguments developed in Q|. 

A signal-to-noise analysis is a simplified way to estimate the stability of stored patterns 
by studying what happens to a generic unit i during the perfect retrieval of a given pattern, 
assessing whether the state of this unit is likely to be stable or not. We can choose this 
retrieved pattern to be without loss of generality. Eq. 01 can then be rewritten as 
1 Y 

j^i I ^J.>l j^i I 

where the terms in the RHS stand for signal (^), noise (p) and threshold respectively. Gen- 
erally speaking, if the field had only the signal part then the state would be stable, but the 
noise can destabilize it. 

As usual in this kind of analysis, we consider the contribution of the noise term in Eq. M 
as if it were a normally distributed random variable, i.e. through its average and its standard 
deviation. In general both quantities scale like p, but in some special cases the average noise 
is zero and the standard deviation scales only like y/p, which means that one can store more 
patterns, as the noise level is reduced. It is clear that the well behaved family of models 
which we are looking for must fit into this favorable situation. As we said, a necessary but 
not sufficient condition is the average of the noise to be zero. There are two ways of imposing 
this into the model. The first way is to make = — aK^, but in this case the standard 
deviation still scales like p. The second way is to use 



(7) 



which makes the standard deviation scale like Including this condition, the average 
signal and the standard deviation of the noise are 
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where terms of order have been discarded. 

The storage capacity Oc can be estimated as the largest value of a for which h^^' is still 
likely to be the largest among all 5* + ! local fields. The situation is quite different depending 
on whether ^} is in an active state or not, so one needs to analyze both cases. Note first 
that = 0, so if Q = the rest of the local fields must be negative. For this to hold true 
at least within one standard deviation of the noise distribution we require q — U ± p < 0, or 
in other words 



a + 



U E 



> 



NKlKua{l — a) \^ 



aa 



where we have adopted a positive k„. 

In the case in which Q is not the zero state two conditions must be fulfilled, namely 
/i/ > W and /i/ > hi \ These conditions can be condensed into 
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The most stringent of these 2 conditions determines etc. By choosing a suitable threshold 
U = ^K^Kua{l — a) [| — a] both conditions are made equivalent, thus optimizing the storage 
capacity. This choice determines a storage capacity of 

-1 




1-1 



(8) 



Note that the expression between curly brackets is equal to or greater than 1 — a. As 
a consequence, the system remains optimal as long as this expression remains of order 1, 



which, considering always a to be closer to than to 1, occurs when the expression I 1 — ^ 



remains of order 1. For this to be true we must impose 



All QiKjq. 



(9) 



We thus define the well behaved models as those which fulfil the conditions given by Eq. 
m and Eq. IHl This simple analysis indicates that the storage capacity of models in the well 
behaved family scales like S'^/a. 

In the following subsections we examine different models proposed in literature, both 
within and outside the well behaved family. 



A. Symmetric Potts model 

The symmetric Potts model was the first 5'-state neural network to be proposed Its 
units can reach S equivalent states but no zero state. Though simple, a model constructed 
with these elements is enough to show the behavior of the storage capacity, as we will 
see. It is defined by setting 

a = 1 
U = 

two conditions related to each other (if there is no zero state, the selectivity mechanism pro- 
vided by the threshold is not necessary). Moreover E = S'^N, which is just a normalization, 
and 



Am — At, — —1 

The conditions given by Eq. [3 and Eq. [HI are fulfilled, and the storage capacity in Eq. [HI is 
approximately 



g-2 



'1 

A 

provided S is large enough. The symmetric Potts model is then a well behaved model of 
sparseness a = 1. 

This model is studied analytically with replica tools in [ll||, where the author finds an 
S{S — 1) behavior of the storage capacity for low values of S. Unfortunately, the cited 
work lacks an analysis for high values of S, which is the interesting limit for modeling 
multi-modular networks. It is not too difficult, however, to clarify the behavior in this limit. 

The replica storage capacity is defined as the highest value of a for which there is a 
solution to the equation 

-1 + S J Dz[<l^iz + y)f-^ ^^^^ 



yi^pl + / zDzmz + + iS- mz - yMz)]S^'} 

where 



(11) 



2 

Throughout this paper we use the gaussian differential Dz = ^-^dz, and the integration 
limits, if not specified, are -00 and 00. 
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We note that in Eq. HH expressions of the form [</'(2;)]'^ can be approximated by displaced 
Heaviside functions for high values of S. Using this we obtain an approximated analytical 
expression for the storage capacity 

2 

(12) 

The factor between brackets in this equation behaves like ln(S')~^ for high values of S, 
which means that the correction for high 5* to Kanter's low 5* approximation is a factor of 
order ln(S')~^. 

We show in Fig. [T] the results of simulations of a symmetric Potts network (A^ = 100) 
contrasted with Kanter's low 5* approximation and our own high S approximation of Eq. 
IT2I The analytical predictions fit tightly the results of the simulations, both for low and 
high S. 
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Figure 1: Storage capacity of a symmetric Potts network of = 100 units for increasing S. Both 
axes are logarithmic. Black dots show numerical solutions for Eq. ^1 which overlap almost perfectly 
with the simulations (plus signs). For low values of 5 (5 < 50) Kanter's low S approximation fits 
well, while the high values of S are well fitted by Eq. El 
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B. Biased Potts model 

This model is proposed and studied in The authors extend the symmetric Potts 

model to an S-state network with arbitrary probability distribution for the states of the 
units in stored patterns. We adapt their formalism to the case of S equivalent states, a zero 
state and sparseness a. The parameters are then 

U =0 
E = N 

(13) 

^mn i^^mn P-n) 

where Pk is the probability of a unit in the stored patterns to be in state k. This model does 
not fit exactly our description because the v operators generate weights J^j that are not 
necessarily zero when k or / are zero. The signal to noise analysis for this situation shows 
a very poor storage capacity, scaling like a^. If one adds a non-zero threshold (f/ ~ a 5* 
in the optimal case) the storage capacity grows but remains of order 1. These two results 
show that allowing for non-zero weights to connect zero states is a drawback for the system. 
The poor performance can, however, be improved by multiplying the v operators by the 
corresponding (1 — 5„o) factors, and by adding a threshold. In this way, instead of Eq. fTTHwe 
introduce our definition, Eq. for the v operators, with the values for k's and A's arising 
naturally from the model as 

Hu = S + 1 
K = -1 

Ky = 1 

Xv = —a 
U ^ aS 

As in the symmetric Potts model, the condition given by Eq. [3 is fulfilled. However, the 
second condition (Eq. ^ can be approximated for high S by 

a>l/(l + l/S)~l 

which does not stand true for sparse coding. If, instead, a <^ 1, the critical value of a in 



10 



Eq. IHlcan be approximated as 

Hence the storage capacity of the biased Potts model can be preserved close to optimal by 
imposing an ad hoc relation between two parameters that are a priori independent, to assure 
1 <^ aS. In this particular situation the model is well behaved. In the opposite limit, when 
a S <^ 1, the storage capacity scales like S^, which is inferior to the S'^/a behavior of the 
well behaved family. 



C. Sparse Potts model. 

The simplest version of a well behaved model is perhaps the one introduced as a model 
for semantic memory with the parameter values 

E = Na{l - a) 

A„ = 
At, = —a 
f/ ~ 1/2 

With these parameters, the sparse Potts model is clearly well behaved, and the storage 
capacity in Eq. [HI becomes 

^2 



IV. REPLICA ANALYSIS 



Having introduced a simple model with optimal storage capacity, we can proceed to 
analyze the corrections to the signal-to-noise estimation by treating the problem in a more 
refined way with the classical replica method. The Hamiltonian in Eq. El can be rewritten 
for the sparse Potts model as 

^ N S N 

^ = "2 E E J'i^'^^^^^.i + E(i - ^-0) 

11 



with 

constructed using 

We consider the limit p — > oo and ^ oo with the ratio a = ^ fixed. Patterns with 

Q 

index z/ {ji) are condensed (not condensed). Following the replica analysis [2| the free energy 
can be calculated as 



" Tr (ln[a(l - - /3Sq)]) + ^ ^ + « + f/ 5) ^ 



^ p,A=l p=l 

where is the probability of a neuron to be in state A; in a stored pattern, as defined in 
Eq. m The order parameters m stand for the overlaps of the states with different patterns, 
and Qpx is analogous to the Edward- Anderson parameter Q|, with the following definitions 

N 



1=1 

N 



i=l \ \ k 



in such a way that they are all of order 1. Consider, for example, that if af = for all i then 
= 1 on average, while = on average if both quantities are independent variables. 
We now make two assumptions. First, we consider for simplicity that there is only one 
condensed pattern, making the index z/ superfious. Second, we assume that there is replica 
symmetry, and substitute 



rrip = m 
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q if p^X 



q if p = X 

r if p^X 
f if p^X 

Taking this into account, we arrive to the final expression for the free energy 
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ln(a(l - a)) + ln(l - hC) - 
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where the finite-valued variable C has been introduced 



C = P{q-q) 



in such a way that it is of order 1 and 

aa /3(r — f) 



Hi 



Note that nl = 0. 



We now derive the fixed-point equation for m as an example of how the limit /3 
taken. The equation for finite (3 is 



1 



m 



a(l — a) 



Dz 



1 



1 + Ep^.exp{/3(H^-H|)} 



(14) 



OO IS 



In the limit /3 ^ oo the expression between brackets is 1 if Tt^ > for every p ^ a and 
otherwise. It can be thus expressed as a product of Heaviside functions. The equation for 
m at zero temperature is then 



m 
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In the same way we derive the rest of the fixed point equations at zero temperature 



c 



1 



(15) 



r 



r = 



(l-aC)2 



P(r -r) = 2U 



c 



l-aC 



The diff'erences between r and f, and between q and q, are of order ^. From the last equation 
it can be seen that the threshold U has the effect of changing the sign of (r — f ) and allowing 
a to scale like with the variables C, r and f, as we have said, of order 1 with respect to 
a and S. 

A. Reduced saddle-point equations 

It is possible to calculate the averages in Eqs. UHlby reducing the problem to the following 
variables, which represent respectively signal and noise contributions 



where we have introduced the normalized (order 1) storage capacity a = aa/S'^, which 
clarifies that both variables x and y are also of order 1. 
At the saddle point, using equations [13 we obtain 



which shows that the relevant quantities to describe the system are m, q, and Cy/r. Following 
this we compute the averages and get from Eq. [THlthe corresponding equations in terms of 






(16) 
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y and x 

q = il^ / Dw I Dzct>{zf+ 



a 



ya+x—i\/d,u 

Dz(t){z + yf + {S-l) I Dw i Dz<p{z - y)(t){zf-^ 

y{l—a)+x—i\/5,w J J ya+x—iy/hw 



m = I Dw I Dz(f){z + y) — q 



= ^ (^^^ / Dw r Dz{z + zVdw)(i){zf+ 

— a) I a J Jya+x-iVKw 

/poo 
Dw / Dz{z + iVEw)(j){z + yf + 

J —y{l—a)+x—i\/d,w 

+ {S-1)IdwI Dz{z + iVlw)(p{z-y)(P{zf-^\ (17) 

Putting together Eqs. [THl and Eqs. [T3 one can construct the system of two equations 
that determine the storage capacity. We show an example of their solution in Fig. [2l for 
the parameters U = 0.5, S = 5 and varying sparseness, contrasting it with simulations of a 
network of = 5000 units. This figure shows quite a good agreement between simulations 
and numerical solutions for a region of the sparseness parameter a, whereas for a < 0.3 finite 
size effects appear, resulting in a lower storage capacity than predicted theoretically. 

B. Limit case 

Given that the equations presented in the previous subsection are quite complex, we now 
analyze the simpler and interesting limit case a ^ 1. Though it is not evident from the 
equations, the normalized storage capacity ac goes to zero in a logarithmic way as d goes 
to zero, which means that the storage capacity is not as high as the simple signal to noise 
analysis of section 3 might suggest. Our analysis of the replica equations for the symmetric 
Potts model (Eq. IT2|) showing logarithmic corrections is an example of this. We now analyze 
as another example the sparse Potts model in the case U = 0.5. 

For the limit of a -C 1 one can approximate Eqs. fTTIbv 

m ^ (f){y — x) (18a) 
q ~ + _ a;) (185) 
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— •— Numerical solutions 
Simulations: S=5, N=5000, U=0.5 
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Figure 2: Dependence of the storage capacity of a sparse Potts network of A'^ = 5000 units on the 
sparseness o. The black dots show numerical solutions of Eqs. ^land Eqs. El while the red line 
shows the result of simulations. For very sparse simulations (low values of a) finite size effects are 
observed, which make the storage capacity lower than predicted by the equations. 

which is still quite a complex system. We can now make some self consistent assumptions. 
First we note that, considering x and y as variables that diverge logarithmically as a goes 
to zero, Eqs. \lSh\ and I18d indicate that ^/q ^ C^/r. Second, for U = 1/2 it is possible to 



consider x ~ ?/, and thus, from Eq. llHfll y ^ l/\^25: and x ^ e/ V25:, where e is a correcting 

factor for x which is close to 1. With this in mind, and taking into account that a goes to 
zero with a, we can approximate Eq. 11861 and Eq. llScI by keeping only the second term in 
the first case and only the first term in the second. The equations for y and x can be derived 
from Eqs. 11861 and Eqs. [HI 







(19) 
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Replacing x by e/\/2& (and e by 1 where irrelevant)we can approximate a as 

1 



a 



4 In 



(20) 



Next, we posit that is the larger factor in the logarithm, while {2U — e)^^ gives a 
correction. A rough approximation for Oc is then 



Otr 



4 a In (I) 



(21) 



which, inserted in[Tni gives 



{2V -e) = (1-a) 



47r In I - 

a 



This expression can be re-inserted into [201 in order to get a more refined approximation 

^2 



4aln(iVWI)) 



(22) 



We show in Fig. El that the approximation given by Eq. [221 fits quite well the numerical 
solution of the sparse Potts model's storage capacity, particularly for very low values of a. 



V. DILUTED NETWORKS 

In this section we present two modifications to our model which make the network bio- 
logically more plausible in terms of connectivity. 

First, after considering, to a zero*^ order approximation, the long range cortical network 
as a jully connected network, we now wish to describe it, to a better approximation, as 
a network in which the probability that two units are connected is Cm/N . Traditionally, 
analytic studies have focused on two soluble cases: the fully connected, which we have 
studied in the previous sections {cm = N), and the highly diluted (cm ^ log(A^)). A recent 
work has shown, however, that the intermediate case is also analytically treatable and that 
the storage capacity of an intermediate random network, regardless the symmetry in the 
weights, stands between the storage capacity of the limit cases Supported by this 

result, we will focus on the (easier) solution for the highly diluted case, and consider any 
intermediate situation to be between the two limits. 
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Figure 3: Corrections to the ^ behavior of the storage capacity of a sparse Potts network for very 
low values of d in the U = 0.5 case. The normalized storage capacity aca/S"^ is represented, with 
black dots from numerical solving Eqs. ^land Eqs. El for two values of the sparseness: a = 0.3 
and a = 0.0001; with color lines from the corresponding approximation given by Eq. 1191 

The second modification refiects the notion that, although the function of long range 
connections is to transmit information about the state of a local network to another one, this 
transmission might not be perfectly efficient. We thus introduce an efficacy e, the probability 
that, in the reduced Potts model, a given state of the pre-synaptic unit is connected with a 
given state of the post-synaptic one. 

Introducing these two modifications, the weights of the sparse Potts model become 

where C^j is 1 with probability ecu/N and otherwise. 

The local field for the unit i and the state k can be analyzed into a signal, a noise and a 
threshold part, just as in Eq. 01 

hi = E 4''^-.' - (1 - = (1 - '^'^o) { (^-^ - ^)^^' + Nk-U] (23) 
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where 



m'! = 



CMea[ 

Generally, when studying highly diluted networks, the noise term A^^ can be treated directly 
as a uniform distributed random variable, because the states of different neurons are un- 
correlated. In this case, can not be considered as a random variable but rather as a 
weighted sum of normally distributed random variables rji, 
s 



Cm e(l — a)a 



- - ^a,o) \ = 'YiSik - a)rii 



The mean of rji is zero for all / and its standard deviation is 

NaPiqf 



ivf) 



with 



(1 — a)cM e 



Cm ea 



Note that mf and gf are analogous to and qpx used in Section 4. If cm e is large enough 
these quantities tend to be independent of i and k. 



Na{ 
1 



Following the analysis of highly diluted networks in the retrievable stable states of 
the network are given by the equations 



m 



1 + Ep^aexp 



where the local field, as in Eq. l23lis 



/ij = m w^p - f/(l - 5po) + Y 



aN qPi 



Cm e (1 - a) 
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These equations are equivalent to those obtained with the rephca method (which in the 
zero temperature hmit are Eqs. [121 and Eq. HH respectively) if one considers C = (and, 
thus, r = q) and an effective value of a given by ag// = p/ {cm e). 

Comparing this result with that for the fully connected model one notes that, as a — > 0, 
the influence of C in the overall equations becomes negligible (this can be guessed already 
in EqUH). Therefore if the coding is very sparse, the fully connected and the highly diluted 
networks become equivalent, and consequently also the intermediate networks. We show 
this in Fig. HI As the parameter a goes to zero, the storage capacity of the fully connected 
and the highly diluted limit models converge. 




Figure 4: A comparison of the storage capacity of a fully connected and of a highly diluted sparse 
Potts networks. Numerical solutions to the corresponding equations with U = 0.5. Left, the 
dependence of the storage capacity, in the two cases, on the sparseness a, with 5 = 5. Right, the 
dependence on the number of states per unit 5, with a = 0.1. In both cases we plot the normalized 
storage capacity, to focus only on the corrections to the S'^/a behavior. Note that as a — > the 
storage capacity of the two types of network converges to the same result. 
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VI. INFORMATION CAPACITY 



We have shown that the storage capacity of well behaved models scales roughly like S"^ /a, 
while in the two particular examples that we analyzed in full with the replica method, Eqs. 
[121 and EH there is a correction that makes it 

" ^) 

for high values of S and low values of a. We now discuss why this is reasonable in the general 
case from the information storage point of view. 

It is widely believed, though not proved, that autoassociative memory networks can store 
a maximum of information equivalent to a fraction of a bit per synapse. In our model the 
total number of synaptic variables is given by the different combinations of indexes of the 



weights J^^ 



2, 



number of synaptic variables = N cm S e 

On the other hand, the information in a retrieved pattern is times the contribution of a 
single unit, which, using the distribution in Eq. can be bounded by Shannon's entropy 



H = - 2^ P(x)ln(P(x)) = -[(l-a)ln(l-a) + a ln(a)] 



x^distribution 

The upper bound on the retrievable information over p patterns is then 



information < —p N [(1 — a) ln(l — a) + a ln(a)] 

The first term between brackets is negligible with respect to the second term provided a is 
small enough and 5* is large enough. In this way we can approximate 

information ^ aaln(a) ^ acaln{a) 



number of synaptic variables ~ ~ 
This result, combined with Eq. [211 shows that the storage capacity of our model is 
consistent with the idea that the information per synaptic variable is at most a fraction of 
a bit. 



VII. DISCUSSION 



The capacity to store information in any device, and in particular the capacity to store 
concepts in the human brain, is limited. We have shown in a minimal model of semantic 
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memory, and in progressive steps, how one can expect the storage capacity to behave de- 
pending on the parameters of the system: a global parameter - the sparseness a - and a local 
parameter - the number of local retrieval states S, or, in other words, the storage capacity 
within a module. The S'^/a behaviour, with its corresponding logarithmic corrections, can 
be thought of as the combination of two separate results: the behaviour due to sparse- 
ness and the behaviour of the Potts model, which combine in a simple way. We have 
shown, however, that it is not trivial to define a model that combines these aspects correctly, 
and that the key is how the state operators are defined. From this study we have deduced 
the minimum requirements of any model of this kind in order to have a high capacity. Fur- 
thermore, through the argument of information capacity we present the well behaved family 
as representative of general Hebbian models with the same degree of complexity. 

The featural representation approach has been so far successful in explaining several 
phenomena associated to semantic memorv, like similarity priming, feature verification, cat- 
egorization and conceptual combination P, 1^. The present work demonstrates that the 
advantage of the use of features in allowing the representation of a large number of concepts 
can be realized in a simple associative memory network. More quantitatively, our calculation 
specifies that in the Potts model the number of concepts that can be stored is neither linear 
a„ .M... pow. Q of the „..be. . of value, a fea.^e .a.e, but <,.ad.atic 

in 5*. 

In the case of non-unitary sparseness, one can associate the necessity of introducing a 
threshold (U) term, whatever its exact form in the local field or the Hamiltonian, with 
a criterion of selectivity, which is actually observed in the representation of concepts in 
the brain, as pointed out in the introduction. The threshold behaviour, which is a typical 
characteristic of neurons, appears to be also necessary at the level of local networks in order 
to maintain activity low in the less representative modules. The origin of such a threshold has 
not been discussed in this paper. However, a comment on this issue can be made regarding 
the internal dynamics of local networks. One can show that, as extensively described in the 
literature Q, only when the state of a local autoassociative network is driven by external 
fields sufficiently close to an attractor (inside one of the S basins of attraction) the local 
system may end up retrieving a pattern on its own, a process that from the global network 
point of view corresponds to the activation of a unit. The local basin boundary acts in the 
full system as an effective threshold, roughly equivalent to the simple U term we introduced 
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in the local field of our reduced system. Whether this threshold mechanism is enough, 
or some addition must be made, can be assessed by studying, in the future, the complete 
multimodular network without reducing it to Potts units. 
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